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An automated method for accurate prediction of seizures is critical to 
enhance the quality of epileptic patients While numerous existing studies 
develop models and methods to identify an efficient feature selection and 
classification of electroencephalograph (EEG) data, recent studies 
emphasize on the development of ensemble learning methods to efficiently 
classify EEG signals in effective detection of epileptic seizures. Since EEG 
signals are non-stationary, traditional machine learning approaches may not 
suffice in effective identification of epileptic seizures. The paper proposes a 
hybrid ensemble learning framework that systematically combines pre- 
processing methods with ensemble machine learning algorithms. 
Specifically, principal component analysis (PCA) or t-distributed stochastic 
neighbor embedding (t-SNE) combined along k-means clustering followed 
by ensemble learning such as extreme gradient boosting algorithms 
(XGBoost) or random forest is considered. Selection of ensemble learning 
methods is justified by comparing the mean average precision score with 
well known methodologies in epileptic seizure detection domain when 
applied to real data set. The proposed hybrid framework is also compared 
with other simple supervised machine learning algorithms with training set 
of varying size. Results suggested that the proposed approach achieves 
significant improvement in accuracy compared with other algorithms and 
suggests stability in classification accuracy even with small sized data. 
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1. INTRODUCTION 

Epilepsy is one of the most prevalent and chronic neurological disorder affecting over 50 million 
individuals worldwide of all ages, according to World Health Organization (WHO) in 2019 [1]. Early 
effective diagnosis, treatment will help 70% of epileptic patients to live seizure free. Electrical instability in 
the cortical region of the brain characterizes epilepsy as transient, abrupt, and periodic. Method to detect 
brain signals non-invasively is through electroencephalograph (EEG) and study of EEG recording is 
cumbersome and non-derivable, so an automated seizure detection system recognizes specific EEG sections 
to review and analysis [2]. 

In [3] the author discusses an approach for seizure prediction and detection in the time, frequency 
and time-frequency domains using techniques like Independent Component Analysis (ICA) and Principal 
Component Analysis (PCA). A classifier using a multilayer perceptron neural network (MLPNN) is proposed 
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in [4]. Additional studies utilize random ensemble learning approach [5]. Most recently extreme gradient 
boosting [6] is considered. There is lack of assessment in existing approaches in the ensemble learning 
framework. To address above limitations, this paper proposes in; i) A hybrid ensemble learning extracts 
significant features and processes as epileptic seizures; ii) Pre-processing involves dimensionality reduction 
techniques to extract features and apply on machine learning (ML) algorithms. Accommodates linearity and 
non-linearity between features through PCA and stochastic neighbor embedding (t-SNE) methods; iii) A 
gradient based ensemble machine learning method, XGBoost [6] is presented which combines the predictive 
analysis of multiple learning approaches and minimizes the error in sequential manner. Dataset is acquired 
from UCI ML library to demonstrate [7]. Results suggest that the proposed hybrid approach shows similar or 
improved MAP score when conpared with frequently used techniques. Varying size of training set between 
0.2 to 0.8 achieves between 93-95% MAP score allowing selection of smaller training size to further improve 
efficiency through speed. The remainder of the paper is structured as: section 2 examines related works, 
section 3 details proposed hybrid framework, section 4 reviews existing ML methods, section 5 describes 
data to assess the proposed approach. Section 6 provides conclusion and future scope. 


2. METHODS 

Various transforming approaches have been recommended for automatic seizure identification, 
analysis, and recognition [8]. Discrete wavelet transform (DWT) based methodologies with neural networks, 
Fourier are preferred and used in [9]. Frequency domain studies on feature extraction of epileptiform episodes 
are prevalent. Power spectral density (PSD) can be calculated using parametric approaches [10]. When precise 
results were difficult to classifiy, genetic algorithm was constructed in [11]. 

Using time-series analysis, ime-domain attributes for feature extraction is done. Exponential energy 
with classes of entropy like Shannon, Renyi and energy-based features in [12] is taken. Different 
decomposition methods like wavelet transform are applied through time-frequency analysis. Discrete cosine 
transform (DCT) with wavelets as coslets are efficient in identifying low frequency components over multi- 
resolution scale [13]. Alterations in brain states are found by nonlinear methods [14], entropy and 
approximation entropy (ApEn) are extracted as features and linear classifier is used. In [15] intrinsic mode 
functions (IMFs) using emphirical mode decomposition (EMD) are got and the IMF’s energy, instantaneous 
area, coefficient of variation and fluctuation index as characteristics are used. Error as linear prediction error 
energy (LPEE) is got by approximation of EEG signals in [16]. 

Artificial neural networks are extensively used in the modeling of non-linear system [17], [18]. Ren 
and Wu [19] Convolutional deep belief neural network and Ubeyli [20] Lyapunov exponent and probabilistic 
neural network (PNN) is used to classify. Currently, deep learning in seizure detection is being implemented 
and in combination with machine learning have shown remarkable performance. In [21], deep learning has 
received a wider scope of learning temporal patterns. Recurrent neural networks (RNN) can also be used for 
EEG analysis [22]. In recent years, ML algorithms are used for EEG signal acquisition, noise removal as 
signal pre-processing and finally classifying EEG signals. 


3. THE PROPOSED HYBRID ENSEMBLE LEARNING FRAMEWORK 
Epilepsy is a neurological disorder generating electrical actions and can be recorded. A need for 
developing automatic systems to evaluate and diagnose is essential. To address the challenges encountered 
by non-stationarity of signals, we propose a hybrid ensemble learning framework to improve the 
classification accuracy regardless of the time-varying frequency in data or sample size and the steps are as: 
- Step 1: Denoising: For raw EEG data, 0.3 Hz frequency range is selescted by applying band pass filters. 
- Step 2: Data Preparation: EEG data is highly unstructured with high variance and hence standardization is 
a requirement in machine learning algorithms. To achieve zero mean and unit variance, the standard scaler 
approach is applied and performing operations independently and mathematically expressed as: 


Z =*= (1) 


where u and o are mean and standard deviations of a sample x. 

- Step 3: Train/Test Split: Training and testing subsets are considered. Training subset is recommended to 

have 60-80% of the filtered data. 

Step 4: Dimensionality Reduction: PCA and t-SNE are found to be efficient in removing less significant 

features and to accommodate linearity and non-linearity between features in compressed domain. 

- Step 5: Clustering: k-means clustering partitions into k clusters with reference to the centroid. 

- Step 6: Ensemble Learning: Selected features from Step 5 are then fitted through XGBoost technique. Because 
of the better computation speed and accuracy XGBoost is considered better than Random Forest classifier. 
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In the proposed boosting technique few layers are generated, and hypothesis is drawn with fewer 
split trees, but in bagging techniques, trees are allowed to grow to its maximum extent. PCA and k-means 
clustering techniques are applied capturing both spatial and spectral information in reduced feature space. 
These features are subjected to distinctive learning process using KNN, Random Forest approaches and 
further classified. 


4. MACHINE LEARNING ALGORITHMS 

Algorithms considered as part of the hybrid framework is summarized. Best accuracy and slight 
delay in detecting seizures is the aim of the models. Using biological datasets for better results can be 
achieved by realizing reasonable and important patterns It also describes other supervised and unsupervised 
ML methods presented and compared among PCA, logistic regression, k-nearest neighbors (KNN), artificial 
neural network (ANN), random forest, and XgBoost. Data Preprocessing is the first step involved as the first 
step. Here the raw data must overcome noise from human body in the form of noise due to electrical field and 
other interferences. This is modelled as per the need of algorithm. Once preprocessing is done, classification 
techniques are implemented to categorise as epileptic or non epileptic. It also aids in reviewing the 
classification performance metrics and the results are compared ahead. 


4.1. Principal component analysis 
The primary applications of the exploratory data analysis method (PCA) are feature extraction and 
dimensionality reduction [23] and is expressed as: 


ul Su, = = DNs (ul En- D) (2) 


where u1 is a D-dimensional vector x is mean of the sample set, S is the data covariance matrix, and N is the 
size of the sample space. In (2), the projected variance uf S u, is maximized with respect to u4. 


4.2. t-distributed stochastic neighbor embedding 

PCA fails to visualize non-linear properties of the data, so tSNE is used as an alternate where 
distance between two data points is converted to known probabilities using gaussian distribution function. If 
‘i? & ‘j’ are any two data points, eucledian distance between ‘i’ & ‘j’ are converted to probabilities of high 
and low dimensions using following gaussian distribution equations. 


2 
exp(-llxi- xjl|"/20?) 
P jji Liewi exP(—Ilxi-xxll2/207) 


(3) 
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Visi = (4) 
where pj; and qj; are probability values for high and low dimensional data. When distance between 2 points is 


increasing, their probability is decreasing so the 2 points shall not fall in the same cluster. Hence, the cost 
function is considered to minimise probability using KL divergence and is computed using following equation. 


C = Xi KL (P; || QD) = Vi dj ji logi (5) 


4.3. K-means clustering 

This is an unsupervised learning technique, where attribute and label are not used for prediction, 
instead looks for features and then classifies. Here, groups are based on data features having similar qualities. 
Kabir et al. [24] used a K-means clustering approach to cluster the EEG signal. This algorithm uses 
Euclidean as the metric. Steps followed by k-means algorithm are given: i) Centroids are created by 
randomly selecting k (i.e., 2) points as cluster centers; ii) Estimating the distance with respect to each 
centroid, data point is allocated to the nearest cluster; iii) Evaluating the average of the allocated points a new 
cluster center is found; vi) Iterate steps 2 and 3 till none of the cluster allocations alter. 


4.4. Logistic regression 

Statistical regression model [25] for categorical responses is logistic regression, which models the log 
odds ratio of the posterior probability of categorial response as linear model of the explanatory variables, x 
denotes a vector of explanatory variables and y € {0,1} denotes binary output. The logistic model is given as: 
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log OD = Bo + BT x © 


where B0 is the intercept, and 6 € R? is a vector of coefficients for the p variables. In general, let (x1, y1), 
RT , (xn, yn) be a training sample. The model parameters are identified by maximum likelihood 
estimation, where the log-likelihood for n observations is: 


LCBo Bo) = Xizi — (Bo + BT xi) — log(1 + exp(By + B7 xi) )] (7) 


4.5. Support vector machine 
SVM algorithm classifies the data using hyperplane to tackle linear and nonlinear classification and 
regression problems [26]. Assuming the data is linearly separable, the decision function is: 


y(x) = w" Ox) +b (8) 


where @(x) denotes the fixed feature-space transformation, w is HTE M-dimensional vector, and b is the bias 
parameter. The numerical value of w and b of the optimal separating hyperplane is: 


Min Q(w,b) = = ||wll? (9) 
subject to y;(x) = w7 Ø(x) +b = 1, for i=l...... ,M 


4.6. Naive Bayes 
By calculating the likelihood that the data in question (x) belongs to class C, the Naive Bayes 
technique applies the Bayes theorem to the solution of classification issues. Mathematically expressed as: 


P(X=x|C) P(C) 


P (CX =x) =“ 


(10) 
where P(X = x|C) is the conditional probability, P(C) is the state probability of class C, and P(X = x) is the 
normalizing density. Let Yi be a discrete valued variable with discrete or real valued attributes, Xi for i = 1, 2 
AOE , n and Y be the desired output probability distribution for each instance of X to be classified. 


4.7. K-nearest neighbor 

Classification and regression are done by nonparametric method and entire data is utilized for 
training. KNN captures the idea of similarity like distance, proximity, or closeness and grouping is done by 
fixing a number to K value. Euclidean distance is used to estimate the distance between an unknown sample 
and point. The distance is calculated wrt origin and the sample values of EEG sample. Based on distance, 
features are extracted and sorted in ascending order. If K=1, the unknown sample is classified wrt the nearest 
sample from the training set. KNN's ability to be updated with new datasets and to function well is its unique 
property. From the sorted array, the upper k rows are selected. 


4.8. Random forest 

Random forest is a cluster of decision trees built to be more robust and limits overfitting and errors. 
The feature selection is random and is known to perform better, when features are categorical, so random 
forest (RF) is apt when large number of variables are present. The test features should pass through the rules 
of each tree and later the algorithm returns the predicted target. Ensemble bagging or averaging multiple 
randomly chosen trees from the dataset allows the random forest technique, which is not typically thought of 
as a boosting kind. 


4.9. Proposed ensemble learning: XG boost technique 

Ensemble machine learning technique uses gradient descent method to combine analytical analysis 
of various learning approaches to learn seizure features more optimally, it can be developed by training a 
model through same learning algorithm or diverse learning algorithms. Ensemble learning can be broadly 
classified into Bagging and Boosting. Bagging method trains the model by splitting the train data randomly 
into different trees and average of these trees are considered for final prediction. Besides bagging, boosting 
technique generates trees sequentially based on the relevance feedback in closed loop. Subsequent tree learns 
from its predecessor and aggregates the response for final classification. Unlike random forests which 
possess low bias with high variance will have parallel decesion trees, XGBoost will produce sequential 
decision trees as shown in Figure 1. XGBoost will initially have high bias and low variance to obtain 
decision trees (along with week classifiers) at different levels by regularly updating feature weights. Finally, 
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weak classiifers are combined to reduce the bias level and increase the classifier efficiency. Low bias 
decision tree will have fixed levels (e.g. 3 to 4). 


Weighted Samples (Updated) 


Training Samples => Decision Decision Decision 

with Weights (w) Tree 1 => => Tree 3 => A Tree M 
Model 1 Model 2 Model 3 Model M 
Classifier Classifier Classifier Classifier 


¥ 


Final Classifier 


Figure 1. Block diagram of XGBoost model showing sequential decision trees 


5. RESULTS AND DISCUSSION 

This section uses numerical examples to show the efficiency of the algorithms. EEG dataset 
obtained from the UCI ML repository [7] is subjected to ML techniques and contrasted based on MAP score. 
The EEG waveform of various classes of UCI ML repository [7] is also demonstrated in Figure 2. For each 
of the methods, MAP score demonstrates Random Forest and XGBoost being best in Table 1. The train and 
test size is varied from 0.2 to 0.8 samples in steps of 0.1 and MAP scores of various ML techniques using 
PCA+k-means and tSNE+k-means with varying training set is demonstrated and inferred in Table 2 and 3. 


5.1. Experimental procedure and MAP score 

The UCI-ML EEG dataset [7] consists of 11,500 samples with 179 data points of duration Is 
collected involving 100 individuals subjected to 23.6 seconds of recording. The last column represents the 
response variable as label Y € {1,2,3,4,5} and the remaining columns denote explanatory variables X € 
{x1,...x178}. Data categorized into Y € 1 are patients with epileptic seizure (i.e Class 1) and Y € 2,3,4,5 are 
non epileptic (class 2,3,4,5) as demonstrated in Figure 2. The UCI-ML EEG dataset is split into train and test 
samples, pre-processed and significant features extracted by applying PCA or t-SNE in conjunction with 
kmeans clustering. Extracted features are processed and classified through ensemble learning methods 
including random forest and XgBoost. Performance is evaluated using the MAP score and results are 
recorded for each instance. These results are validated using K-fold cross validation. 

The average precision mean iscalculated using the average precision (AP) as its unit. 


AP = YE" 1[Recall(i) — Recall(i + 1)] * Precision(i) (11) 
In (11), n denotes number of thresholds and recall(n)=0, while precision(n)=1. 


5.2. Comparison of MAP score 

To calculate the efficiency of the suggested method, existing approaches in assessing EEG data is 
identified from the literature, where we split x% of the dataset into training and the remaining as testing. The 
dataset goes through two stages of processing: 
- (S.1) Feature extraction with PCA followed by k-means clustering. 
- (S.2) Random Forest and XGBoost is applied on features selected from step (S.1) and MAP scores are 

computed. 

Table 1 reports MAP score. Results of proposed approach are highlighted, and all methods achieved 
a MAP score of ~ 80% and higher. While LDA, logistic regression, PCA, and SVM exhibits an improvement 
in the score with a difference of 2% or higher, ANN, Naive Bayes, and KNN achieve similar scores. 
However, proposed ensemble learning approaches have a higher MAP score, suggesting high classification 
accuracy. 
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Figure 2. EEG signals depicting 5 different classes with samples 


Table 1. Comparison of suggested and existing approaches using MAP scores on data from epileptic seizures 


Methods MAP Score (%) 
LDA 79.20 
Logistic regression 82.83 
Principal component analysis 89.00 
Wavelet + SVM 92.30 
ANN 93.00 
Wavelet+Gaussian Naive Bayes 93.21 
KNN 93.96 
Random Forest Classifier 94.92 
XGBoost 94.10 


5.3. Performance analysis 

To evaluate the efficiency of the suggested approach, other methods such as logistic regression, 
KNN, SVM, and naive Bayes are applied, which exhibited significant MAP score applied to UCI-ML EEG 
dataset with hybrid pre-processsing. Dataset is split into train and test, test size varying from 0.2 to 0.8 
samples in steps of 0.1 and proposed ensemble learning model are allowed to learn from the remaining 
feature vectors. 

Table 2 shows MAP scores of various ML techniques using PCA+k-means with varying size of train 
and test subsets. Numbers reported in Tables 2 is rounded down to the nearest integer. Table 2 suggests that 
the MAP score across different train size do not vary much suggesting consistent accuracy even when the 
algorithms are trained on small training subset. This is because of the stability of the pre-processing stage 
involving clustering and PCA, which enables selection of significant features from clusters. SVM has smaller 
scores compared to other methods, however, overall random forest, XGBoost, and Gaussian NB achieves 
over 93% MAP score consistency regardless of the size of the data set. 


Table 2. MAP scores of various ML techniques using PCA+k-means with varying training set 
MAP score (%): PCA+k-means 


ML Methods Train size 
0.2 03 #04 05 06 07 0.8 
SVM 4 74 B B #72 #72 65 
LR 89 89 89 89 89 89 85 
KNN 94 94 93 94 94 94 94 


Random Forest 93 93 94 94 94 94 94 
Gaussian NB 93 93 93 94 93 9% %4 
XGBoost 94 94 94 94 94 94 95 %4 
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Table 3 shows MAP score of algorithms considered in Table 2 where PCA is replaced with tSNE, 
which handles feature extraction in data set with small sample size. It is also evident that, results with trend 
like results reported in Table 2. Compared individual MAP scores between the two tables, tSNE does not 
seem to improve the accuracy significantly. In both Table 2 and 3, smaller training set size can be selected to 
improve the overall computation speed. 


Table 3. MAP scores of various ML techniques using tSNE+k-means with varying train set 
MAP score (%): tSNE+k-means 


ML Methods MAP Score (%) for Varying train size 
0.2 03 04 05 06 07 08 
SVM 65 64 63 63 62 63 62 
LR 84 85 81 86 80 85 81 
KNN 92 93 93 93 93 94 95 


Random Forest 93 94 94 95 95 95 95 
Gaussian NB 80 8l 82 87 81 86 83 
XGBoost 94 92 92 92 93 93 93 95 


6. CONCLUSION 

Due to its massively parallel and distributed structure of computation, the proposed ensemble 
learning technique has proved more effective in learning random patterns of seizures to provide better 
estimates of epilepsy in highly non-linear EEG signals. Proposed boosting type uses gradient descent method 
to reduce the loss and generates a single model to give better performance in comparison with bagging type 
and other conventional unimodel ML techniques. Moreover, the proposed hybrid framework achieved high 
and consistent accuracy even with small sized data. However future work suggests that using appropriate 
method like RNN, better accuracy can be obtained in detecting pre-ictal regions, where the preprocessing 
techniques or configurations of the RNN LSTM need to be adjusted. Using Genetic algorithms, dominant 
features to detect pre-ictal periods can be found. Appropriate methods for dimensionality reduction can be 
implemented to eliminate redundant features. 
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